Building A Tool For Annotating Reference In Discourse
نویسندگان
چکیده
We discuss the development of a system for marking several types of reference to facilitate the analysis of reference in discourse. The tool is designed to be used in three applications: generating training data for machine learning of co-reference relations, evaluating theories of referring expression generation and resolution in texts, and developing theories for understanding reference in dialogs. The need to mark any of a broad set of relations which may span several levels of discourse structure drives the system architecture. The system has the ability to collect statistics over encoded relations and measure inter-coder reliability, and includes tools to increase the accuracy of the user’s markings by highlighting the discrepancies between two sets of markings. Using parsed corpora as the input further reduces the human workload and increases reliability.
منابع مشابه
Annotating Discourse Relations with the PDTB Annotator
The PDTB Annotator is a tool for annotating and adjudicating discourse relations based on the annotation framework of the Penn Discourse TreeBank (PDTB). This demo describes the benefits of using the PDTB Annotator, gives an overview of the PDTB Framework and discusses the tool’s features, setup requirements and how it can also be used for adjudication.
متن کاملToward a Discourse Theory for Annotating Causal Relations in Japanese
We present a revised discourse theory based on segmented discourse representation theory and provide a method for building a Japanese corpus suitable for causal relation extraction. This extends and refines the framework proposed in Kaneko and Bekki (2014), and we evaluate our corpus and compare it with that work.
متن کاملDialogueView: annotating dialogues in multiple views with abstraction
This paper describes DialogueView, a tool for annotating dialogues with utterance boundaries, speech repairs, speech act tags, and hierarchical discourse blocks. The tool provides three views of a dialogue: WordView, which shows the transcribed words time-aligned with the audio signal; UtteranceView, which shows the dialogue line-by-line as if it were a script for a movie; and BlockView, which ...
متن کاملAnnotating the Structure and Semantics of Fables
This paper outlines an annotation scheme we developed for a corpus of fables. Reference is made to previous studies on discourse structure and story grammar, as well as discourse relations and text coherence. The applicability and adequacy of the various frameworks for annotating and analysing fables are considered. The current work addresses several issues including the basic units for discour...
متن کاملAnnotating Subordinators in the Turkish Discourse Bank
In this paper we explain how we annotated subordinators in the Turkish Discourse Bank (TDB), an effort that started in 2007 and is still continuing. We introduce the project and describe some of the issues that were important in annotating three subordinators, namely karşın, rağmen and halde, all of which encode the coherence relation Contrast-Concession. We also describe the annotation tool.
متن کامل